25 research outputs found
Searching from Area to Point: A Hierarchical Framework for Semantic-Geometric Combined Feature Matching
Feature matching is a crucial technique in computer vision. Essentially, it
can be considered as a searching problem to establish correspondences between
images. The key challenge in this task lies in the lack of a well-defined
search space, leading to inaccurate point matching of current methods. In
pursuit of a reasonable matching search space, this paper introduces a
hierarchical feature matching framework: Area to Point Matching (A2PM), to
first find semantic area matches between images, and then perform point
matching on area matches, thus setting the search space as the area matches
with salient features to achieve high matching precision. This proper search
space of A2PM framework also alleviates the accuracy limitation in
state-of-the-art Transformer-based matching methods. To realize this framework,
we further propose Semantic and Geometry Area Matching (SGAM) method, which
utilizes semantic prior and geometry consistency to establish accurate area
matches between images. By integrating SGAM with off-the-shelf
Transformer-based matchers, our feature matching methods, adopting the A2PM
framework, achieve encouraging precision improvements in massive point matching
and pose estimation experiments for present arts.Comment: v
Learning Discriminative Visual-Text Representation for Polyp Re-Identification
Colonoscopic Polyp Re-Identification aims to match a specific polyp in a
large gallery with different cameras and views, which plays a key role for the
prevention and treatment of colorectal cancer in the computer-aided diagnosis.
However, traditional methods mainly focus on the visual representation
learning, while neglect to explore the potential of semantic features during
training, which may easily leads to poor generalization capability when adapted
the pretrained model into the new scenarios. To relieve this dilemma, we
propose a simple but effective training method named VT-ReID, which can
remarkably enrich the representation of polyp videos with the interchange of
high-level semantic information. Moreover, we elaborately design a novel
clustering mechanism to introduce prior knowledge from textual data, which
leverages contrastive learning to promote better separation from abundant
unlabeled text data. To the best of our knowledge, this is the first attempt to
employ the visual-text feature with clustering mechanism for the colonoscopic
polyp re-identification. Empirical results show that our method significantly
outperforms current state-of-the art methods with a clear margin
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval
Colonoscopic video retrieval, which is a critical part of polyp treatment,
has great clinical significance for the prevention and treatment of colorectal
cancer. However, retrieval models trained on action recognition datasets
usually produce unsatisfactory retrieval results on colonoscopic datasets due
to the large domain gap between them. To seek a solution to this problem, we
construct a large-scale colonoscopic dataset named Colo-Pair for medical
practice. Based on this dataset, a simple yet effective training method called
Colo-SCRL is proposed for more robust representation learning. It aims to
refine general knowledge from colonoscopies through masked autoencoder-based
reconstruction and momentum contrast to improve retrieval performance. To the
best of our knowledge, this is the first attempt to employ the contrastive
learning paradigm for medical video retrieval. Empirical results show that our
method significantly outperforms current state-of-the-art methods in the
colonoscopic video retrieval task.Comment: Accepted by ICME 202
Towards Discriminative Representation with Meta-learning for Colonoscopic Polyp Re-Identification
Colonoscopic Polyp Re-Identification aims to match the same polyp from a
large gallery with images from different views taken using different cameras
and plays an important role in the prevention and treatment of colorectal
cancer in computer-aided diagnosis. However, traditional methods for object
ReID directly adopting CNN models trained on the ImageNet dataset usually
produce unsatisfactory retrieval performance on colonoscopic datasets due to
the large domain gap. Additionally, these methods neglect to explore the
potential of self-discrepancy among intra-class relations in the colonoscopic
polyp dataset, which remains an open research problem in the medical community.
To solve this dilemma, we propose a simple but effective training method named
Colo-ReID, which can help our model to learn more general and discriminative
knowledge based on the meta-learning strategy in scenarios with fewer samples.
Based on this, a dynamic Meta-Learning Regulation mechanism called MLR is
introduced to further boost the performance of polyp re-identification. To the
best of our knowledge, this is the first attempt to leverage the meta-learning
paradigm instead of traditional machine learning to effectively train deep
models in the task of colonoscopic polyp re-identification. Empirical results
show that our method significantly outperforms current state-of-the-art methods
by a clear margin.Comment: arXiv admin note: text overlap with arXiv:2307.1062
Recommended from our members
Noncontact position measurement system using optical sensors
Optical and computational components are combined to form a high precision, six degree-of-freedom, single-sided, noncontact position measurement system. Reflective optical targets are provided on a target object whose position is to be sensed. Light beams are directed toward the optical targets, producing reflected beams. Electrical signals are produced indicating the points of intersection of the reflected beams and the position-sensitive detectors. The position sensitive detectors may be lateral-effect photodiodes. The signals are transformed to provide measurements of translation along and rotation around three nonparallel axes which define the space in which the target object moves.Board of Regents, University of Texas Syste
Deep Multimodal Fusion for Generalizable Person Re-identification
Person re-identification plays a significant role in realistic scenarios due
to its various applications in public security and video surveillance.
Recently, leveraging the supervised or semi-unsupervised learning paradigms,
which benefits from the large-scale datasets and strong computing performance,
has achieved a competitive performance on a specific target domain. However,
when Re-ID models are directly deployed in a new domain without target samples,
they always suffer from considerable performance degradation and poor domain
generalization. To address this challenge, in this paper, we propose DMF, a
Deep Multimodal Fusion network for the general scenarios on person
re-identification task, where rich semantic knowledge is introduced to assist
in feature representation learning during the pre-training stage. On top of it,
a multimodal fusion strategy is introduced to translate the data of different
modalities into the same feature space, which can significantly boost
generalization capability of Re-ID model. In the fine-tuning stage, a realistic
dataset is adopted to fine-tine the pre-trained model for distribution
alignment with real-world. Comprehensive experiments on benchmarks demonstrate
that our proposed method can significantly outperform previous domain
generalization or meta-learning methods. Our source code will also be publicly
available at https://github.com/JeremyXSC/DMF
Novel Graphene Biosensor Based on the Functionalization of Multifunctional Nano-BSA for the Highly Sensitive Detection of Cancer Biomarker
Abstract A simple, convenient, and highly sensitive bio-interface for graphene field-effect transistors (GFETs) based on multifunctional nano-denatured bovine serum albumin (nano-dBSA) functionalization was developed to target cancer biomarkers. The novel graphene–protein bioelectronic interface was constructed by heating to denature native BSA on the graphene substrate surface. The formed nano-dBSA film served as the cross-linker to immobilize monoclonal antibody against carcinoembryonic antigen (anti-CEA mAb) on the graphene channel activated by EDC and Sulfo-NHS. The nano-dBSA film worked as a self-protecting layer of graphene to prevent surface contamination by lithographic processing. The improved GFET biosensor exhibited good specificity and high sensitivity toward the target at an ultralow concentration of 337.58 fg mL−1. The electrical detection of the binding of CEA followed the Hill model for ligand–receptor interaction, indicating the negative binding cooperativity between CEA and anti-CEA mAb with a dissociation constant of 6.82 × 10−10 M. The multifunctional nano-dBSA functionalization can confer a new function to graphene-like 2D nanomaterials and provide a promising bio-functionalization method for clinical application in biosensing, nanomedicine, and drug delivery
Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification
Generalizable person re-identification (Re-ID) is a very hot research topic
in machine learning and computer vision, which plays a significant role in
realistic scenarios due to its various applications in public security and
video surveillance. However, previous methods mainly focus on the visual
representation learning, while neglect to explore the potential of semantic
features during training, which easily leads to poor generalization capability
when adapted to the new domain. In this paper, we propose a Multi-Modal
Equivalent Transformer called MMET for more robust visual-semantic embedding
learning on visual, textual and visual-textual tasks respectively. To further
enhance the robust feature learning in the context of transformer, a dynamic
masking mechanism called Masked Multimodal Modeling strategy (MMM) is
introduced to mask both the image patches and the text tokens, which can
jointly works on multimodal or unimodal data and significantly boost the
performance of generalizable person Re-ID. Extensive experiments on benchmark
datasets demonstrate the competitive performance of our method over previous
approaches. We hope this method could advance the research towards
visual-semantic representation learning. Our source code is also publicly
available at https://github.com/JeremyXSC/MMET